NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows

https://doi.org/10.1145/3716134

Grunde-McLaughlin, Madeleine; Lam, Michelle S; Krishna, Ranjay; Weld, Daniel S; Heer, Jeffrey (June 2025, ACM Transactions on Computer-Human Interaction)

LLM chains enable complex tasks by decomposing work into a sequence of subtasks. Similarly, the more established techniques of crowdsourcing workflows decompose complex tasks into smaller tasks for human crowdworkers. Chains address LLM errors analogously to the way crowdsourcing workflows address human error. To characterize opportunities for LLM chaining, we survey 107 papers across the crowdsourcing and chaining literature to construct a design space for chain development. The design space covers a designer’sobjectivesand thetacticsused to build workflows. We then surfacestrategiesthat mediate how workflows use tactics to achieve objectives. To explore how techniques from crowdsourcing may apply to chaining, we adapt crowdsourcing workflows to implement LLM chains across three case studies: creating a taxonomy, shortening text, and writing a short story. From the design space and our case studies, we identify takeaways for effective chain design and raise implications for future research and development.
more » « less
Free, publicly-accessible full text available June 30, 2026
Aligning Language Models with Demonstrated Feedback

Shaikh, Omar; Lam, Michelle S; Hejna, Joey; Shao, Yijia; Cho, Hyundong; Bernstein, Michael S; Yang, Diyi (April 2025, International Conference on Learning Representations (ICLR 2025))

Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic output is possible through supervised finetuning or RLHF, but requires prohibitively large datasets for new ad-hoc tasks. We argue that it is instead possible to align an LLM to a specific setting by leveraging a very small number (< 10) of demonstrations as feedback. Our method, Demonstration ITerated Task Optimization (DITTO), directly aligns language model outputs to a user's demonstrated behaviors. Derived using ideas from online imitation learning, DITTO cheaply generates online comparison data by treating users' demonstrations as preferred over output from the LLM and its intermediate checkpoints. Concretely, DITTO operates by having an LLM generate examples that are presumed to be inferior to expert demonstrations. The method iteratively constructs pairwise preference relationships between these LLM-generated samples and expert demonstrations, potentially including comparisons between different training checkpoints. These constructed preference pairs are then used to train the model using a preference optimization algorithm (e.g. DPO). We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts. Additionally, we conduct a user study soliciting a range of demonstrations from participants (N = 16). Across our benchmarks and user study, we find that win-rates for DITTO outperform few-shot prompting, supervised fine-tuning, and other self-play methods by an avg. of 19% points. By using demonstrations as feedback directly, DITTO offers a novel method for effective customization of LLMs.
more » « less
Free, publicly-accessible full text available April 25, 2026
Clarify: Improving Model Robustness With Natural Language Corrections

https://doi.org/10.1145/3654777.3676362

Lee, Yoonho; Lam, Michelle S; Vasconcelos, Helena; Bernstein, Michael S; Finn, Chelsea (October 2024, ACM)

Full Text Available
Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM

https://doi.org/10.1145/3613904.3642830

Lam, Michelle S; Teoh, Janice; Landay, James A; Heer, Jeffrey; Bernstein, Michael S (May 2024, ACM)

Full Text Available
Embedding Democratic Values into Social Media AIs via Societal Objective Functions

https://doi.org/10.1145/3641002

Jia, Chenyan; Lam, Michelle S; Mai, Minh Chau; Hancock, Jeffrey T; Bernstein, Michael S (April 2024, Proceedings of the ACM on Human-Computer Interaction)

Mounting evidence indicates that the artificial intelligence (AI) systems that rank our social media feeds bear nontrivial responsibility for amplifying partisan animosity: negative thoughts, feelings, and behaviors toward political out-groups. Can we design these AIs to consider democratic values such as mitigating partisan animosity as part of their objective functions? We introduce a method for translating established, vetted social scientific constructs into AI objective functions, which we term societal objective functions, and demonstrate the method with application to the political science construct of anti-democratic attitudes. Traditionally, we have lacked observable outcomes to use to train such models-however, the social sciences have developed survey instruments and qualitative codebooks for these constructs, and their precision facilitates translation into detailed prompts for large language models. We apply this method to create a democratic attitude model that estimates the extent to which a social media post promotes anti-democratic attitudes, and test this democratic attitude model across three studies. In Study 1, we first test the attitudinal and behavioral effectiveness of the intervention among US partisans (N=1,380) by manually annotating (alpha=.895) social media posts with anti-democratic attitude scores and testing several feed ranking conditions based on these scores. Removal (d=.20) and downranking feeds (d=.25) reduced participants' partisan animosity without compromising their experience and engagement. In Study 2, we scale up the manual labels by creating the democratic attitude model, finding strong agreement with manual labels (rho=.75). Finally, in Study 3, we replicate Study 1 using the democratic attitude model instead of manual labels to test its attitudinal and behavioral impact (N=558), and again find that the feed downranking using the societal objective function reduced partisan animosity (d=.25). This method presents a novel strategy to draw on social science theory and methods to mitigate societal harms in social media AIs.
more » « less
Full Text Available

Search for: All records